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The Galaxy is suspected to contain hundreds of millions of binary white dwarf systems, a large 
fraction of which will have sufficiently small orbital period to emit gravitational radiation in band for 
space-based gravitational wave detectors such as the Laser Interferometer Space Antenna (LISA). 

LISA’s main science goal is the detection of cosmological events (supermassive black hole mergers, 
etc.) however the gravitational signal from the galaxy will be the dominant contribution to the data 
- including instrumental noise - over approximately two decades in frequency. The catalogue of 
detectable binary systems will serve as an unparalleled means of studying the Galaxy. Furthermore, 
to maximize the scientific return from the mission, the data must be “cleansed” of the galactic 
foreground. We will present an algorithm that can accurately resolve and subtract > 10000 of these 
sources from simulated data supplied by the Mock LISA Data Challenge Task Force. Using the 
time evolution of the gravitational wave frequency, we will reconstruct the position of the recovered 
binaries and show how LISA will sample the entire compact binary population in the Galaxy. 


I. INTRODUCTION 

Just as traditional photon astronomy has profited 
from accessing the entire electromagnetic spectrum, so 
to would gravitational wave (GW) astronomy benefit 
from detectors covering many decades in frequency. With 
existing GW observatories, high-frequency gravitational 
waves are the target of ground based interferometers, and 
the LIGO/Virgo collaboration [1, 2] have developed as- 
tonishingly sensitive observatories which are poised to 
bring GW astronomy out of its infancy. 

While small wavelength gravitational waves are, obser- 
vationally, the most readily accessible, the richest signal 
space in the GW universe is at frequencies far too low for 
Earth-based instruments. A space-borne interferometer 
is the only foreseeable way of reaping the bounty of infor- 
mation transmitted at longer wavelengths, and allowing 
gravitational wave astronomy to reach its full potential. 

Unique to the mHz frequency range is the existence of 
known sources, comprised of close binary star systems in 
the Galaxy, mostly the AM CVn-type binary stars [3] . 
These individual objects, discovered electromagnetically, 
are the near-by representatives of a much larger popula- 
tion of low frequency gravitational wave sources on our 
cosmic doorstep. Population synthesis models for the 
galaxy predict some 60 million binary star systems emit- 
ting gravitational radiation at frequencies between 0.1 
and 10 mHz [4-6] . Because gravitational waves interact 
very weakly with intervening matter, a staggeringly large 
number of these objects, distributed throughout the en- 
tire Galaxy, are within reach. 

Maximizing what can be gleaned from mHz GW data 
will require solving unique analysis problems in gravi- 
tational wave astronomy. In particular, the challenges 
posed by these galactic binaries are the sheer number 
of sources, and the degree with which they overlap one 
another in signal space, smearing together to form a con- 
fused blend of gravitational wave power which, depend- 


ing on details of the detector, can be orders of magnitude 
larger than the instrumental noise floor. The total num- 
ber of binaries which are individually resolvable out of 
this population is unknown and poses a very large dimen- 
sion model selection problem. The harm in over-fitting 
the data, (i.e, tolerating large false alarm probabilities) 
or under-fitting the data (i.e., accepting large false dis- 
missal rates) not only affects the science that can be done 
with the catalogue of resolved signals, but can also im- 
pact the data analysis efforts for more distant sources of 
gravitational radiation which share the same bandwidth. 

To prepare for this challenge, we have set out to build a 
“detection pipeline” which can automatically solve every 
facet of this problem. To wit, we need to locate can- 
didate sources in the data, select for the most parsimo- 
nious number, accurately estimate the physical param- 
eters that describe the system, and cleanly regress the 
sources from the data. While the analysis software we 
have built is flexible with regards to the details of the in- 
strument,, we use the NASA/ESA LISA mission [7] with 
the intention of participating in round four of the Mock 
LISA Data Challenges (MLDC) [8]. MLDC datasets are 
released in pairs: one coming with the list of signal pa- 
rameters within the data, and one without (henceforth, 
the “training data” and “blind data”, respectively). For 
this study, we will focus on analyzing the training data in 
which all other types of sources have been removed, leav- 
ing behind only the galactic binaries and the instrument 
noise. The capabilities of the algorithm on this reduced 
dataset will serve as a realistic demonstration of what we 
could achieve on the blind data, as the only cosmolog- 
ical sources which can impact the number of resolvable 
binaries are the brightest of the binary black hole merg- 
ers and cosmic strings, neither of which pose a serious 
challenge to existing search algorithms. 

The challenge presented by the galactic foreground has 
been addressed with iteratively more sophistication in 
past MLDCs [9], Challenge 2 was the first to simu- 
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late a complete galaxy and in response to this, Crowder 
and Cornish developed the BAM algorithm [10] which, to 
date, has been the most successful attempt at solving 
the entire problem. The BAM codes themselves have been 
lost to the sands of time, and did not model the evolu- 
tion of the binary’s orbital period (nor did Challenge 2). 
Other existing algorithms which have worn their teeth 
on the MLDC data sets include an F-statistic maximiza- 
tion scheme using a Nelder-Mead simplex algorithm [11], 
a hierarchical cleaning algorithm (also employing the F- 
statistic) [12] and an MCMC search algorithm featuring 
a Markovian delayed rejection proposal distribution [13]. 
Beyond MLDC entries there have been numerous proof of 
principal studies including (but not limited to), Refs. [14- 
16]. 

Our goal for this paper is to build a new analysis 
pipeline to handle the signal from the population of galac- 
tic binaries. We are, in effect., attempting to extend the 
BAM algorithm by including frequency evolution in our 
model of the waveforms, and technical advancements in 
GW data analysis made since 2006. 

The paper is organized as follows: §IIAnalysis 

§IIIModel §IVAlgorithm §VCatalogue §VIDiscussion 


II. DATA ANALYSIS BASICS 

There are two desired products of the data analysis 
procedure. For each model of the data (M) under con- 
sideration, we need to find the “best fit” model param- 
eters and have some sense of how well these parameters 
are constrained, as well as determine which model is most 
strongly supported by the data. 

The fundamental piece of math at work here is Bayes' 
theorem which, when cast as an inference problem, takes 
the form 


p(9\s, M) = 


p(s\9. M)p{9\M) 
p(s\M) 


( 1 ) 


The left hand side of equation 1 is the posterior distri- 
bution function for parameters 9 given the data s, from 
which parameter estimation conclusions can be drawn 
within model M. (see, for instance, [17] for a thorough 
introduction). The numerator of the right hand side con- 
tains the product of the likelihood - our “goodness of fit,” 
measure - and the prior, encoding our knowledge before 
the new data were collected. The denominator is the ev- 
idence, or marginalized likelihood for M. Comparisons 
of p(s|A4) between different models reveal which is most 
strongly supported by the data and, assuming uninfor- 
mative priors on the models themselves, should be taken 
as the preferred representation. 

In the Bayesian framework, one needs only to define 
the likelihood and prior distributions, and the rest of 
the analysis is reduced to an oft time-consuming calcu- 
lation. We prefer the Markov Chain Monte Carlo fam- 
ily of algorithms to perform the computation, although 


Nested Sampling, and its offspring MultiNest, have been 
used to similar affect (Ref. XXX ,XXX) . There is no short- 
age of data analysis literature describing the concept of 
MCMCs. However, for the sake of introducing vocabu- 
lary and notation, we briefly sketch the algorithm. 

We begin with some random position in parameter 
space 9 X as the first “link” in the chain by evaluating 
its likelihood p(s\0 x ,M) and prior probability p(0 x \M) . 
Next, we suggest a trial position, 9 y , from a proposal dis- 
tribution q(9 v \9 x ). The new likelihood and prior proba- 
bility are evaluated and 9 y is adopted as the next sample 
in the chain with probability ag w g = min[l. II g z ] 
where Hg j is the Hastings ratio 


. p{s\e y ,M)p{e y \M) q {d x \e y ) 
'* p{s\9 x ,M)p{9 x \M)q(9 y \9 x ) 


(2) 


This process of stochastically stepping through parame- 
ter space repeats until some convergence criteria are sat- 
isfied. Equation 2 is derived from the detailed balance 
condition, which requires that the probability of being 
at state 9 X and transitioning to state 9 y is the same as 
being at. 9 y and moving to 9 X . If satisfying this condition 
when adopting new solutions in the chain, the number 
of iterations spent in a particular region of parameter 
space, normalized by the total number of steps in the 
chain, yields the probability that the model parameters 
have values within that region. 

The choice of q{9 y \9 x ), by construction, does not al- 
ter the recovered posterior distribution function. The 
proposal distribution does, however, dramatically affect 
the acceptance rate of trial locations in parameter space 
and, therefore, the number of iterations required to sat- 
isfactorily sample the posterior. To help mitigate the 
multi-modal structure of the target distributions, we in- 
clude parallel tempering [18] - a set of chains, running 
simultaneously, each at a higher “temperature” - as is 
becoming standard in GW applications of MCMCs [? ? 


We use a trans-dimensional (or “reverse jump”) 
Markov Chain Monte Carlo (RJMCMC) [19, 20] as the 
tool for sampling the target posterior distribution func- 
tion in model space. The RJMCMC is novel in its ability 
to move between models, with the number of iterations 
the chain spends in a particular model proportional to 
the marginalized likelihood for that model. This class of 
MCMC algorithm has previously been used to study the 
galactic binary detection problem on toy problems [21] 
and smaller data sets [15, 16], but has yet been turned 
loose on the full galaxy. 
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III. THE DATA MODEL 


We model the data s as having two contributions: 


M 

S K = n K {fj K ) + ]T h\{ X). (3) 

i 

The instrument noise, n, is assumed to be stationary and 
gaussian with colored spectral density parameterized by 
if K . The gravitational wave component of the data is 
the superposition of M gravitational wave templates in, 
each parameterized by A which will be described in de- 
tail later. The subscript k denotes the different interfer- 
ometer channels synthesized from the LISA phase-meter 
data. We use the usual noise orthogonal AET chan- 
nels [22]. Because we expect all of the galactic binaries 
to be well below the LISA transfer frequency /* = c/27 rL 
we can neglect the T channel which is, in effect, GW-free 
for / < /.. 


1. The waveform model 

Galactic binaries in the LISA band are expected to 
exhibit relatively little frequency evolution during the 
lifetime of the mission. Thus, the phase of the GWs 
emitted from the binary can be safely approximated as 
$(t) = ifo+‘hx fot + Tx fot 2 + ... where higher order deriva- 
tives of / can be neglected for binaries below ~ 9 mHz 
during a five-year-long mission [23], For this work, we 
fix fo = 0, as is the case in the MLDC data simulations. 

Given these assumptions about the phase evolution of 
the binary, as well as restricting the templates to circular 
orbits (perhaps a dubious constraint Ref. XXX), We can 
fully describe a GB waveform with eight parameters: 

% —* (Vcn fo, Q, <P, -4oi L i*, <Po') (4) 

where the subscript 0 indicates the value taken at the 
first time-sample in the data. Parameters { 6 , cp} describe 
the sky-location of the binary in ecliptic coordinates, and 
{/,, ip, ipa) are angles that fix the orientation of the binary. 
The amplitude 


4 2 A4 5/3 (tt/ 0 ) 2/3 

51 


( 5 ) 


couples the chirp mass M and luminosity distance Dl 
preventing the independent measurement of either quan- 
tity. If the binary orbital evolution is driven purely by 
the emission of gravitational waves (as opposed to, for in- 
stance, mass transfer between the individual stars in the 
binary), then the linear term in the frequency evolution 
depends only on the frequency and chirp mass via: 


/ = 


— 7r 8 / 3 A4 5 /3/ n /3 
5 


(6) 


Sources which satisfy this condition wall henceforth be 
referred to as detached binaries. For this data set, we 
have the advantage of knowing that any binaries with 
fo > 0 are detached, allowing us to make a determination 
of Di. When analyzing data from the Galaxy itself, we 
will not have this foresight. For the rare cases where 
fo can also be measured, systems being driven only by 
the emission of gravitational radiation must satisfy the 
braking index condition Ref. XXX 



( 7 ) 


LISA’s ability to measure fo, and how ignoring this pa- 
rameter impacts the data analysis, has been preliminarily 
explored in [23] and [24]. We will address this important 
detail in a follow-on study [25]. 

To compute the instrument response to a particular 
galactic binary signal we use the fast-slow decomposition 
as detailed in [26]. 

We utilize a prior on the location of any given bi- 
nary constructed from the number density of stars in the 
galaxy. Our model for the galactic distribution is simi- 
lar to that from which the MLDC datasets were drawn 
Ref.XXX. The density profile has two components, one 
from the disk and one from the bulge: 


Pbulge 


Pdisk 


(x/tt R h f 




-sech 2 


"gc 


iir R 2 d Z ( r~~ \Z d/ 

Pgalaxy = dpbulgc + (1 ~ d) Pdisk 


(8) 


where r| c = x\ c + y 2 c + z 2 c , u 2 c = x 2 c + y 2 c , and 
{•Cgo Vgc, %c} are the cartesian galactic coordinates of 
the source. For our purposes, the parameters of the 
galaxy distribution used are [27] 


{ R b . R d , Z d , A} = {690 pc, 2520 pc. 302 pc, 0.24} (9) 


Using this distribution we build a joint prior 

P^fo,fo,^, ( p,Ao) = G Pgalaxy (40) 


with a complicated normalization constant C that we 
approximate by Monte Carlo integration over the prior 
volume. The quantities not constrained by this prior are 
the orientation parameters, which we take as having uni- 
form a priori distributions over [0, 27r] for ip and ipo, and 
[—1,1] for cost. 

For templates of detached systems we determine the 
distance to the binary using equations 5 and 6. For mass- 
transferring binaries we have no way of making such a 
determination without better understanding of the or- 
bital dynamics. We construct a separate prior for such 
systems, marginalizing Eq. 10 over fo, fo, and do leaving 
behind a prior on { 6,<j > } only and uniform distributions 
on the marginalized parameters. 
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2. The noise model 


IV. THE ALGORITHM 


Given nominal levels for the shot- and acceleration- 
noise (S s and S a ), the baseline noise power spectral den- 
sity for the LISA A and E channels is 


Sn(f) 


3 sm T 


2 + cos 


/. 


+ 2 (3 + 2cos <^ 4J 


( 11 ) 


To this we must add an estimate of the confusion noise 
S c which is derived from data simulations 
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( 12 ) 

To allow for modeling error in the noise levels, and the 
vagaries of the particular noise realization in the data, we 
include parameters which characterize departures from 
this theoretical noise power spectral density as described 
in [15] . A separate noise level is defined for each of several 
narrow bandwidth segments of data, each of length A N b 
frequency bins. The i th segment is rescaled as S n (f) — *• 
ViSnif)- For Gaussian noise, the expectatio n valu e for 
t? when measured over IVn b bins is <x^ = 1 / A N b • We 
accommodate for additional ignorance with respect to 
the noise level by using a normal distribution A r [l,4<7~] 
as the prior on each iji. 


A. Overview 

In this section we will describe each phase in the 
pipeline. Following a coarse overview of the entire proce- 
dure. a detailed, step-by-step description of the algorithm 
can be found in the subsections. 

The data are first divided into small bandwidth sub- 
sets, each of which is independently analyzed. Each data 
window is studied iteratively. At each iteration, we in- 
clude an additional template in the model until we reach 
the maximum evidence. For each iteration, we apply a 
variety of tricks to increase the efficiency of the search, 
many of which spoil the statistical properties of the chain. 
After the “illegal” chains have located the modes of the 
target posterior distribution function they are used to 
produce proposal distributions for the subsequent steps 
in the search where we take care to satisfy detailed bal- 
ance. The evidence is determined by including the num- 
ber of templates in the model as a parameter. For ex- 
ample, during iteration I the RJMCMC is allowed to 
move between models containing 0 < i < I templates. 
The model in which the RJMCMC spends the most it- 
erations, AfJnax, is the one with the highest evidence. If 
i m ax < I then that window is finished being analyzed, 
and the MAP parameters from model < I are 

stored in the master list for the full dataset. 


B. Preparing the Data 


3. The likelihood function 


With the noise and signal models now declared, the 
likelihood p(s \0) is computed over N Fourier bins, built 
from the assumption of colored Gaussian noise where 
the noise power spectral density is being fit over A seg = 
A/Anb narrowband segments of data, via: 


lnp(s|0) 


A.E 

E 


N, ( 


(r K \r K ) + N NB 


• (13) 


The residual 


XI 


r K =s k -22K(Z) 


(14) 


appears in equation 13 inside of the noise- weighted inner 
product defined as 


2 y- d’‘(/)fo(/ ) + a(f )b*(f ) 
Fobs j V(f) s n{f ) 


(15) 


where, as described above, ?/ takes the same value over 
Vnb Fourier bins. 


The bandwidth of a typical galactic binary waveform 
is sufficiently small that the data can be divided into 
small subsets, or windows, each of which spans a rel- 
atively small range in frequency, and each window can 
be analyzed independently. For practical purposes, this 
makes it simple to scale the analysis code from a test- 
ing platform to running on the full dataset. While many 
CPUs are required to process all of the data, they do not 
need to talk to one another while doing so. 

The galactic binary search will be hindered at low fre- 
quency by power from high SNR black hole binaries in 
the data, as well as bursts from cosmic strings. Both 
the black hole binaries and the cosmic strings have very 
unique time-frequency characteristics. This means the 
brightest sources can be cleanly removed from the data 
without marring the signal from the galaxy. The accu- 
rate detection of black holes has been a main theme of 
LISA science studies and is definitely a manageable task, 
while the high-accuracy detection of the cosmic strings 
has been successfully demonstrated in the previous round 
of the MLDC [? | . We do not see the removal of black 
hole or cosmic string sources as a substantial technical 
challenge, and so have focused the efforts in this paper 
on training data with only instrument noise and the sig- 
nal from the galaxy. For the blind data analysis to follow 
this work, a collaborative effort will be needed to perform 
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this first cleaning step. 

In each window, care needs to be taken at the edges 
as templates will try to fit signals from adjacent data 
segments which have power “leaking" into that which is 
being analyzed. Thankfully this was addressed in the BAM 
algorithm by having a smaller acceptance region within 
each window where the initial frequency must fall if the 
source is going to be taken as a true detection. Bordering 
this acceptance region to the edges of the window are 
the “wings” of the data segment where templates are 
included in the model but the sources they recover are 
not stored as detections. Adjacent windows are tiled so 
that the end of one acceptance region is the beginning of 
another. Thus we have full coverage of the data without 
any overlap between acceptance regions, and without the 
risk of double-counting sources that happen to lie at the 
interface of two data segments. Figure 1 shows a cartoon 
depiction of a data window. 

The size of the windows is not something that can 
be fixed for all signals. The amplitude of a galactic 
binary waveform scales as / 2 / 3 , meaning the discrete 
Fourier transform of a signal typically has significant 
power across a larger bandwidth as we move to higher fre- 
quency data. Furthermore, for detached binaries (which 
make up the bulk of the galactic population) / scales 
as / 11/3 so signals with high initial-frequency typically 
spread their power over more bins during the course of 
the observation. 

Because of this frequency-dependent bandwidth, the 
size of the wings and, for efficiency's sake, the size of the 
windows, is frequency dependent. 


C. The Search Phase 

The purpose of the search phase is to rapidly locate 
the sources in the data (i.e., the modes of the posterior 
distribution function). Here we are not concerned with 
satisfying detailed balance, or producing samples which 
represent the posterior, but instead are focused on effi- 
ciency. The two most substantial cost saving enhance- 
ments come from reducing the dimension of the search 
by maximizing the likelihood over “extrinsic parameters” 
(orientation and distance) using the F-statistic [28, 29], 
and by making the signals a bigger target in the search 
space through simulated annealing. 

Simulated annealing is another common trick per- 
formed when using Markov Chain Monte Carlo-like 
methods to rapidly locate the modes of the distribution. 
It works by initially suppressing the influence of the like- 
lihood terms in the Hasting's ratio (Eq 2) by “heating” 
the distribution being searched 

p{s\6y) { p(s\6y) V (16) 

P(s 'fix) \v{s\dx)j 

with the exponent playing the role of an inverse “tem- 
perature” B = 1/T with 0 < (3 < 1. Early iterations of 
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FIG. 1: A cartoon depiction of a data window to be 
independently searched. The shaded portions are where 
detected binaries will be excluded from the master list. 
All templates within the same Signal Block are updated 
simultaneously. One parameter r\ is used to measure the 
noise PSD over each Noise Block. This concept was 
derived for the BAM algorithm and adapted from 
Ref. [10] 


the chain are run at high temperature (small 3) forcing 
the likelihood ratio term to ~ 1. The influence of the 
likelihood is gradually increased as the search space is 
gradually “cooled” at each iteration i via 


0=i (ft) 7 max 0<1<T 

(17) 

(1 i > t 



until 0 goes to 1 and the chains are sampling the target 
distribution. 

Simulated annealing requires some tuning in order for 
it to actually improve the search’s efficiency. The ad- 
justable parameters to the annealing scheme, the maxi- 
mum temperature T max and the cooling time r, need to 
be custom suited for each problem. Conceptually, what 
we want is for T max to be high enough that chains are 
able to freely explore the full prior range, while not being 
so absurdly hot that we spend many iterations with no 
hope of locking on to any of the signal. A reasonable rule 
of thumb is that the effective SNR of the signals as seen 
by the tempered chain is reduced from the true SNR by 
a factor of ~ 1/y/T. To this end. we want T max to be 
~ SNR 2 , and can reasonably determine it based on the 
excess Fourier power in the data: 

T max = {s\s)-AN. (18) 

Setting the cooling rate r is an exercise in trial and error, 
and depends on the bandwidth of the data window, with 
higher frequencies warranting longer cooling times. 
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In addition to simulated annealing, and in all other 
phases of the analysis, we use the now commonplace 
method of “parallel tempering” where multiple chains 
are run simultaneously at different temperatures, with 
exchanges of parameters between chains subject to the 
detailed balance condition. 

To further increase the efficiency of the search, we wish 
to reduce the volume of the search space. For this pur- 
pose, the F-statistic is a tool which has proven extremely 
useful in LIGO/ Virgo searches [30], black hole searches 
for LISA [31], and galactic binaries [10, 29? ]. Because 
of it’s frequent use in the GW data analysis literature, 
we leave the details to the aforementioned references. 

While the speed-up in the search time when using the 
F-statistic is substantial, once the (approximately) best- 
fit frequency and sky location for each source in the 
model have been located, it becomes a liability. For a 
model containing Ns sources, a single F-statistic eval- 
uation involves 4N$ calls to the waveform generator as 
well as a 4iVs x 4iVs matrix inversion, ultimately costing 
more than 4 Ns likelihood evaluations. Therefore, once 
the F-statistic has done its job and found the modes, the 
chains are much more efficient reverting to the Gaussian 
likelihood as described in §11. The points of the chain 
from the F-statistic search are discarded as the “burn 
in” samples. 

At this point we are interested in producing an en- 
semble of samples that approximate the target posterior 
distribution function. However, we are not yet ready to 
abandon some of our cost saving measures in favor of de- 
tailed balance. The posteriors for these signals are mul- 
timodal and in a few percent of trial runs, the burn-in 
phase ends on a secondary maximum of the distribution. 
The nature of these near-degeneracies was explored with 
detail in [10]. To summarize, the orbital motion of the 
LISA constellation, as well as the finite number of data 
points, imparts a harmonic structure to the waveforms 
on either side of the initial frequency of the binary. The 
templates can fix themselves to a sub-dominant harmonic 
while still achieving overlaps with the injected signals of 
> 70%. 

Such features in the posterior expose the weaknesses 
of an “out. of the box” MCMC. While the chain is guar- 
anteed to eventually converge, in most cases we are not 
willing to wait long enough. While there are certainly 
more elegant ways of overcoming these types of chal- 
lenges, we offer a brute-force approach inspired by de- 
layed rejection [32, 33]. We propose some position in pa- 
rameter space ti y and, without asking the Hastings ratio 
for any input, temporarily adopt this position and evolve 
the chains from there. After some number of updates 
the chain arrives at O' . We then look back and calculate 
the transition probability ag, . This transition prob- 
ability does not satisfy the detailed balance condition, 
and so the samples from the chain will be biased in some 
way. For a lot of effort, delayed rejection performs this 
type of exploration while preserving detailed balance as 
described in detail within a gravitational wave data anal- 


ysis context by Trias et al in Ref [13]. For our purposes, 
we are not yet concerned with the statistical properties 
of our chain and accept the fact that our recovered dis- 
tributions will be biased. 

Our lazy implementation of delayed rejection is per- 
fectly suited to prevent us from sticking on a sub- 
dominant harmonic of the waveform. The secondary 
modes appear at integer multiples of the LISA modu- 
lation frequency f rn = 1/year. We propose a shift in 
frequency by some n/ m , where n is a random integer 
drawn from U[-6,6], adopt that solution, and then allow 
the chains a few iterations to refine the remaining param- 
eters (in particular, the sky location) before comparing 
back to the current solution. This addition to the search 
procedure mitigates the instances of recovered sources 
having the central frequency on a harmonic induced by 
the orbital motion. 

Figure 2 is a scatter plot of chain samples in the 
p(s|$) — / plane. The plot is centered at the true fre- 
quency of the injected source. The sub-dominant modes 
of the distribution are clearly visible as peaks in the 
likelihood surface occurring at even integer multiples of 
} m T = 2 Fourier bins. In this example, the search chain 
approached from higher frequency. The multi-modal 
structure of the likelihood surface due to orbital motion 
is roughly symmetric about the injected value. A single 
Markov chain without the benefit of delayed rejection or 
parallel tempering would be severely challenged by these 
local maxima. 



(f-fjnj) X T 

FIG. 2: Delayed Rejection at work: This scatter plot 
shows samples from a search chain in the p(s[$) — / 
plane. The plot is centered at the true frequency of the 
injected source. Delayed rejection enables the chain to 
efficiently leave a secondary maximum in favor of the 
true source parameters. 
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D. Parameter Estimation & Model Selection 

Upon the completion of the search phase, the chains 
have produced samples of some biased distribution func- 
tion, the modes of which closely correspond to that of 
the target posterior. To accurately refine the characteri- 
zation of the model and to evaluate wether or not it is the 
one favored by the data, we run a RJMCMC following all 
of the rules to ensure the chain samples are representa- 
tive of the target posterior. RJMCMCs are notoriously 
difficult to work with in high-dimension problems. The 
proposal distributions have to be informative enough that 
drawing a random point out of (for this example) an 8 
dimensional parameter space produces a reasonable fit to 
the data, but are not so constraining that the improved 
fit to the data is overly penalized by how strongly you 
forced the trial solution to that point. For these tech- 
niques to work efficiently, the chains can not tolerate a 
proposal that is anything but a good approximation to 
the posterior. 

Fortunately for us, we spent the search phase produc- 
ing such an approximation. While these chains could 
not be used (ethically) for model selection or parameter 
estimation, there are no rules against employing them 
to build suitable proposal distributions. This concept 
was originally suggested by Green [20] and we have de- 
scribed in detail the procedure applied to this work in 
Ref. [15]. In short, we bin the chains into an 8D grid us- 
ing fisher matrix estimates of the parameter variances to 
set the cell size in the grid. The number of samples from 
the chain that land in a given cell is proportional to the 
probability density in that cell. The proposal distribution 
randomly chooses a cell weighted by its probability, and 
then uniformly draws within the cell to come up with the 
parameters. Recently, Farr and Mandel [34] introduced 
an improvement on how to bin the chain samples by us- 
ing a kD-tree data structure instead of our Fisher-scaled 
grid. It’s clear that this will further improve the effi- 
ciency of these proposals by eliminating the dependence 
on the grid size, something which we had to carefully 
tune. We will transition to the kD-tree decomposition in 
future work. 

As mentioned previously, this analysis is performed it- 
eratively, including an additional signal in the model at 
each iteration. During the characterization phase of the 
procedure on iteration / the RJMCMC is exploring mod- 
els containing anywhere between 0 and I signals. The 
model with the strongest support is the one in which the 
RJMCMC spends the most iterations. When two models 
are similarly supported by the data a more subtle selec- 
tion has to be made. Between any two models we can 
calculate the Bayes factor 

^ _ p(s\Mj) _ # of iterations in Mj . ^ 

'' p(s\M.i) 4r °f iterations in M t 

which is easily interpreted as the odds (ignoring prior 
preferences for one model over another) that M j ) is pre- 
ferred over Mi. Thus. By ~ 1 means that both models 


are similarly supported by the data. While that is a 
nice interpretation, when assembling a catalogue of de- 
tectable galactic binaries and producing a residual fit for 
subsequent searches, decisions need to be made about 
which model to pick in these marginally distinguishable 
cases. This, sadly, somewhat reduces the Bayes factor 
to a statistic used for model selection. Nevertheless, we 
have to draw a line in the sand, and have chosen for this 
study a Bayes factor “threshold” of 12:1 needed to prefer 
a higher dimensional model, above which the support for 
Mj is canonically considered “strong” [35]. 

A more satisfying thing to do would be to repeat the 
analysis for several different By cutoffs, producing ap- 
pendices to the final source catalogue of more specula- 
tive detections, but the time for such an exercise was not 
warranted at this juncture. 

If the highest dimension model is supported by the 
data, we store the map parameters and begin another 
iteration. If not, the MAP parameters from the winning 
model are stored and no more iterations on that win- 
dow are performed. Only sources with initial frequency 
inside the acceptance region are added to the “master” 
list of detections. Figure 3 shows ~ 5 windows’ worth of 
training data, with the best fit waveforms over-plotted. 
A higher frequency window was chosen so that the dif- 
ferent signals in the fit could be distinguished. 



7 7.005 7.01 7.015 7.02 7.025 7.03 7.035 7.04 

f (mHz) 


FIG. 3: Powder spectral density of a small bandwidth 
segment of the training data. This figure spans ~ 5 
search windows. The green [dashed] line is the original 
data, while the red [solid] line shows the best fit signal 
model. 


E. Source Subtraction 

The biggest cost to performing these MCMC runs is 
the waveform calculation. While our waveform model is 
extremely efficient, there is not much that can be done 
about the colossal number of templates that need to be 
computed to resolve 10000+ sources. To help mitigate 



this expense, we want to hold fixed the waveform param- 
eters for sources which are not significantly overlapping 
other sources in the window. 

After each iteration, if the new source is within some 
pre-defined number of frequency bins (depending on the 
characteristic bandwidth of sources in that window), 
than on the following iteration this template, along with 
any others that satisfy the closeness condition, and the 
new source included in the model, are allowed to vary. 
Otherwise, the signals located in previous iterations are 
held fixed. This keeps the number of “active” sources per 
window, per iteration, much lower than the total number 
of detectable binaries in that segment of data. We are 
essentially searching on the residual of the data from pre- 
vious iterations, but with the caveat that if new sources 
get too close to existing detections than all nearby sig- 
nals need to be simultaneously re-characterized. Figure 4 
shows the same data segment as in Figure 3, but with 
the best fit model regressed from the original data. The 
residual is consistent with stationary Gaussian noise at 
the level of the instrument noise. 
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FIG. 4: Same as Figure 3, but with the red [solid] line 
replaced by the residual power spectral density after the 
best fit model has been regressed from the data. 


V. THE RECOVERED SOURCE CATALOGUE 


We performed a comprehensive test run of the algo- 
rithm on the MLDC challenge 4 training data set, con- 
taining only instrument noise and the signals from the 
simulated galaxy. In the interest of saving time for the 
blind data, the search was not carried to completion in all 
frequency windows. In this section we quantify the per- 
formance of the algorithm by comparing the recovered 
catalogue to the source list supplied with the training 
data. 


1. Evaluating the recovered catalogue 


. The total number of detected galactic binaries that 
we located in the data before moving on to the blind data 
set was ~ 9000. The merit of our recovered catalogue was 
judged using the MLDC challenge 3 evaluation software 
available as part of the lisatools software package [? ]. 

For each recovered source /i rec , the evaluation software 
searches through the list of (SNR > 1) sources in the sim- 
ulated galaxy, and determines which injected signal /q n j 
gives the lowest noise-weighted residual (h- m j — h iec \hi n j — 
h rcc ). Once h TCC is paired with the corresponding h,- m j the 
correlation between the two waveforms 


Corr 


(fynj | /irec) 


inj.rec 


V (/'inj|/iinj)(/i rec | h T( 


(20) 


is computed and stored in a “report card” for the recov- 
ered catalogue. A histogram of the correlations for our 
recovered population of sources is depicted in Figure 5. 



correlation 


FIG. 5: Correlations between recovered sources and 
their corresponding signal in the training data. Of the 
~ 9000 detections over 90% had a correlation above 0.9 
with an injected signal. 


Along with the correlation files, the MLDC evaluation 
software saves the parameter error between each recov- 
ered signals and its corresponding source in the data. 
The distribution of errors for the entire catalogue will 
reveal any systematic biases in the parameter recovery. 
Figure 6 shows the distribution of recovered parameter 
biases. We are pleased to report that all parameters show 
a strong peak at zero bias. 

The galaxy search serves the dual purpose of recovering 
a trove of information about the galaxy, and removing a 
substantial amount of foreground signal-pow’er, facilitat- 
ing the search for signals at cosmological distances. The 
residual powder spectral density of the training data after 
having removed the recovered signals is shown in Fig- 
ure 7. The dashed [green] trace shows the galaxv-only 
training data (without BHBs, EMRIs, etc.), while the 
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FIG. 6: [Top left] The distribution of SNRs for the recovered signals. The x-axis is in log 10 SNR.. [All others] 
Distributions of the difference between the recovered signal parameters and the associated source in the data. We 
see no significant systematic biases between our best fit parameters and true values for the recovered sources. The 
orientation parameters of the binary are typically not well measured, hence the larger tails on the error distributions. 


solid [red] line is the residual after our recovered cata- 
logue has been regressed from the data. Notice we only 
performed the search out to 0.01 Hz. At higher frequen- 
cies the signals are typically bright and isolated. This 
makes them easy to find, but computationally expensive 
to characterize, as the bandwidth of the signal increases 
with frequency and amplitude. Individual high frequency 
windows have been tested and will be included in our 
blind search, but we omit them from the results here. It 
is clear from the residual that there are more detectable 
sources than the 9000 on which we are reporting. How- 
ever, as a test of the algorithm we take this as a satisfying 
result. 


2. Galactic binary astronomy from the recovered catalogue 

. While the focus of this work has been the algorithm 
and its performance on simulated data, the real fun be- 
gins when we brandish the source catalogue as an un- 


precedented astronomical tool. Our follow on work to 
this paper will address some specifics, but in the mean 
time there is one piece of “low hanging fruit” that is too 
good to pass up. As mentioned in §??, the inclusion of fo 
in the waveform model presents us with an opportunity 
to disentangle the luminosity distance from the overall 
GW amplitude. To demonstrate this capability, we se- 
lect from the catalogue any binary with SNR>15 and 
foT 2 > 1 and compute the Dg using Eqs ??. The re- 
sulting galactic map is shown in Figure 8. As promoted 
in the LISA science case studies [? ] , we recover a three- 
dimensional map of the entire Galaxy. 


Admittedly, we artificially benefit from knowing that 
any binary with fo > 0 is dynamically evolving under ra- 
diation reaction forces only. How this assumption biases 
the spatial distribution of the detected population is a 
top priority of our subsequent studies. 
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FIG. 7: Power spectral density of the training data 
(dashed/green) and residual (solid/red) after removal of 
the recovered sources. The remaining spikes in the 
residual are detectable signals left behind after we 
prematurely halted the search in favor of analyzing the 
blind data. 


VI. DISCUSSION 

Using [10] as a foundation, we have developed our own 
galactic binary detection code and tested it on the round 
4.0 training data supplied by the Mock LISA Data Chal- 
lenge Task Force. New features to this search algorithm 
are the inclusion of parallel tempering and delayed re- 
jection to increase the efficiency of the search, and using 
a trans-dimensional MCMC to simultaneously character- 
ize the detected binaries, and determine the most likely 
number of binaries in each small-bandwidth segment of 
data being analyzed. We also include the /o parameter 
in the waveform modeling which was not part of the orig- 


inal work by Crowder and Cornish, but was included in 
the MLDC round 3 data and subsequent entries. 

The codes were written for the purpose of participat- 
ing in the blind challenge, so the analysis of the train- 
ing data was halted before completion. Before moving 
on to the challenge data, we recovered ~ 9000 sources, 
90% of which matched one of the injected sources with 
correlation greater than 0.90. It is worth noting that 
a smaller-scale space borne gravitational wave detector 
is more likely to fly in the near future than the full 5 
Gm LISA mission concept. If anything, that makes our 
current search software overqualified to handle whatever 
data we are presented. 

Of course, detecting the sources is the first step to- 
wards using the data to its full potential, as a unique 
probe of the Galaxy. 

Including the first time derivative of the frequency in 
the waveform model allows us to compute the distance to 
binaries whose dynamics are not affected by mass trans- 
fer or tidal effects. From the recovered source catalogue, 
we measured the sky-location and / well enough to con- 
strain Dl of ~ 1000 binaries, sampled from the entire 
volume of the Galaxy. This result hints at the potential 
for low-frequency gravitational wave astronomy to offer 
an unprecedented view of compact stellar remnants are 
distributed. Implicit in the calculation of the distance is 
the supposition that all mass-transferring systems have 
/ < 0. How our ability to map the distribution of the 
Galaxy after relaxing this assumption is impacted will be 
the focus of follow-on work. 
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